Learning Reactive Policies for Probabilistic Planning Domains
نویسندگان
چکیده
We present a planning system for selecting policies in probabilistic planning domains. Our system is based on a variant of approximate policy iteration that combines inductive machine learning and simulation to perform policy improvement. Given a planning domain, the system iteratively improves the best policy found so far until no more improvement is observed or a time limit is exceeded. Though this process can be computationally intensive, the result is a reactive policy, which can then be used to quickly solve future problem instances from the planning domain. In this way, the resulting policy can be viewed as a domain-specific reactive planner for the planning domain, though it is discovered with a domainindependent technique. Thus, the initial cost of finding the policy is amortized over future problem-solving experience in the domain. Due to the system’s inductive nature, there are no performance guarantees for the selected policies. However, empirically our system has shown state-of-the-art performance in a number of benchmark planning domains, both deterministic and stochastic.
منابع مشابه
Policy-Gradient Methods for Planning
Probabilistic temporal planning attempts to find good policies for acting in domains with concurrent durative tasks, multiple uncertain outcomes, and limited resources. These domains are typically modelled as Markov decision problems and solved using dynamic programming methods. This paper demonstrates the application of reinforcement learning — in the form of a policy-gradient method — to thes...
متن کاملDiscrepancy Search with Reactive Policies for Planning
We consider a novel use of mostly-correct reactive policies. In classical planning, reactive policy learning approaches could find good policies from solved trajectories of small problems and such policies have been successfully applied to larger problems of the target domains. Often, due to the inductive nature, the learned reactive policies are mostly correct but commit errors on some portion...
متن کاملLearning to Plan Probabilistically
This paper discusses the learning of probabilistic planning without a priori domain-specific knowledge. Different from existing reinforcement learning algorithms that generate only reactive policies and existing probabilistic planning algorithms that requires a substantial amount of a priori knowledge in order to plan, we devise a two-stage bottom-up learning-to-plan process, in which first rei...
متن کاملAction Schema Networks: Generalised Policies with Deep Learning
In this paper, we introduce the Action Schema Network (ASNet): a neural network architecture for learning generalised policies for probabilistic planning problems. By mimicking the relational structure of planning problems, ASNets are able to adopt a weight sharing scheme which allows the network to be applied to any problem from a given planning domain. This allows the cost of training the net...
متن کاملProbabilistic Programming for Planning Problems
Probabilistic programing is an emerging field at the intersection of statistical learning and programming languages. An appealing property of probabilistic programming languages (PPL) is their support for constructing arbitrary probability distributions. This allows one to model many different domains and solve a variety of problems. We show the link between probabilistic planning and PPLs by i...
متن کامل